Skip to content

Conversation

nikita-savelyevv
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv commented Jun 18, 2025

Changes

Added optimized compression to MXFP4 data type for OpenVINO backend.

Model Memory Before (MiB) Memory After (MiB) Time Before (sec) Time After (sec)
llama-3.2-1b bf16 2778.55 2548.37 (-8.29%) 66.33 24.95 (-62.40%)
llama-3.2-1b fp16 3434.61 2963.03 (-13.73%) 62.12 24.85 (-59.98%)
llama-3.2-1b fp32 2041.79 1576.43 (-22.81%) 62.72 25.85 (-58.77%)
phi4-mini bf16 6384.81 5725.66 (-10.33%) 197.36 66.22 (-66.44%)
phi4-mini fp16 8863.53 8375.75 (-5.51%) 195.85 66.93 (-65.82%)
phi4-mini fp32 4406.82 3897.91 (-11.54%) 195.25 68.83 (-64.76%)
llama-3.1-8b bf16 7297.72 6096.06 (-16.46%) 413.86 135.25 (-67.32%)
llama-3.1-8b fp16 7946.93 8311.89 (+4.58%) 413.64 136.05 (-67.11%)
llama-3.1-8b fp32 7310.48 5043.89 (-30.03%) 411.45 140.66 (-65.81%)

Reason for changes

Improving user experience.

Related tickets

164717

Tests

Extended optimized compression tests.

@nikita-savelyevv nikita-savelyevv changed the title Optimized openvino compression to f4e2m1 data type Optimized openvino weights compression to f4e2m1 data type Jun 18, 2025
@github-actions github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jun 18, 2025
@github-actions github-actions bot removed the NNCF PTQ Pull requests that updates NNCF PTQ label Jul 29, 2025
@nikita-savelyevv nikita-savelyevv changed the title Optimized openvino weights compression to f4e2m1 data type [OV] Optimized compression to f4e2m1 data type Jul 29, 2025
@nikita-savelyevv nikita-savelyevv changed the title [OV] Optimized compression to f4e2m1 data type [OV] Optimized compression to MXFP4 data type Oct 8, 2025
@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review October 9, 2025 08:45
@nikita-savelyevv nikita-savelyevv requested a review from a team as a code owner October 9, 2025 08:45
Comment on lines 111 to 112
For NF4 quantization quantizes the weights to 16 levels on [-1, 1] interval.
TODO(nikita-savelyevv): add support for MXFP4 and MXFP8_E4M3 once ticket 164851 is resolved
For MXFP4 quantization quantizes the weights to 16 levels on [-6, 6] interval.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For NF4 quantization quantizes the weights to 16 levels on [-1, 1] interval.
TODO(nikita-savelyevv): add support for MXFP4 and MXFP8_E4M3 once ticket 164851 is resolved
For MXFP4 quantization quantizes the weights to 16 levels on [-6, 6] interval.
NF4 format uses 16 levels in [-1, 1] range, while MXFP4 uses 16 levels in [-6, 6].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assert isinstance(res_nncf, Tensor)
if (
self.backend() != TensorBackend.tf
): # native Tensorflow operaors do not guarantee to return a tensor on an initial device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
): # native Tensorflow operaors do not guarantee to return a tensor on an initial device.
): # native Tensorflow operators do not guarantee to return a tensor on an initial device.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Code Freeze NNCF OpenVINO Pull requests that updates NNCF OpenVINO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants